A DHT-based Backup System

نویسندگان

  • Emil Sit
  • Josh Cates
چکیده

Distributed hash tables have been proposed as a way to simplify the construction of large-scale distributed applications (e.g. [1, 6]). DHTs are completely decentralized systems that provide block storage on a changing collection of nodes spread throughout the Internet. Each block is identified by a unique key. DHTs spread the load of storing and serving blocks across all of the active nodes and keep the blocks available as nodes join and leave the system. This paper presents the design and implementation of a cooperative off-site backup system, Venti-DHash. VentiDHash is based on a DHT infrastructure and is designed to support recovery of data after a disaster by keeping regular snapshots of file systems distributed off-site, on peers on the Internet. Whereas conventional backup systems incur significant equipment costs, manual effort and high administrative overhead, we hope that a distributed backup system can alleviate these problems, making backups easy and feasible. By building this system on top of a DHT, the backup application inherits the properties of the DHT, and serves to evaluate the feasibility of using a DHT to build large scale applications. The backup system is based around the Venti archival storage system [9], replacing the storage back-end with the DHash distributed hash table [5]. Venti-DHash operates as an archiver that takes complete file system snapshots, at a block level. Each unique block is only stored once, even across snapshots. DHash is used to balance storage and network load, as well as to provide adequate availability blocks. A number of changes were made the internals of DHash in order to meet our desired performance and availability goals. Our improved version of DHash is a DHT with good read and write performance, and 5 nines of availability per block (assuming an average node reliability of 90%). The resulting system is now being tested by running backups of our primary file server. The rest of the paper is structured as follows. Section 2 briefly surveys related work. The design of the backup system is presented in Section 3. Next, we describe how DHash was changed to achieve the desired performance and availability goals in Section 4. Section 5 describes some preliminary performance benchmarks and analysis we have conducted on our prototype. Finally, we conclude in Section 6.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EFFECT OF FIVE ALPHA DIHYDROTESTOSTERONE (5α-DHT) ON CYTOKINE PRODUCTION BY PERITONEAL MACROPHAGES OF NZB/BALBc MICE

One of the mechanisms involved in the regulation of the immune system by steroid hormones could be the monocytic-macrophage system. In this study the effect of the male hormone 5a-DHT on cytokine release by peritoneal macrophages (mΦ) of male and female NZB/BALBc mice was investigated. Macrophages from male mice activated with LPS produced a greater amount of IL-1β (21.8%) (p<0.05) and IL-...

متن کامل

Flexible replica placement for optimized P2P backup on heterogeneous, unreliable machines

P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization’s workstations should be cheaper as existing hardware is used; more efficient as there is no single bottleneck server; and more reliable as the machines can be geographically dispersed. We present an architecture of a p2p backu...

متن کامل

Building a reliable and high-performance content-based publish/subscribe system

Provisioning reliability in a high-performance content-based publish/subscribe system is a challenging problem. The inherent complexity of content-based routing makes message loss detection and recovery, and network state recovery extremely complicated. Existing proposals either try to reduce the complexity of handling failures in a traditional network architecture, which only partially address...

متن کامل

Self Chord-Achieving Load Balancing In Peer To Peer Network

The Cloud computing technology has been widely applied in e-business, e-education. Cloud computing platform is a set of Scalable large-scale data server clusters, it provides computing and storage services to customers. The cloud storage is a relatively basic and widely applied service which can provide users with stable, massive data storage space. Our research shows that the architecture of c...

متن کامل

Replica placement for p2p redundant data storage on unreliable, non-dedicated machines

P2P architecture appears to fit for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization’s workstations should be cheaper (as existing hardware is used) and more efficient (as there is no single bottleneck server). However, non-dedicated machines cause other challenges. Update propagation algorithms must take into acco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003